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2  Summary  of  Technical  Results 

The  general  goal  of  this  project  is  the  application  of  concepts  from  topology 
and  sheaf  theory  to  better  e.xplain  and  systematize  some  of  the  intricate  as¬ 
pects  of  algorithms.  The  basic  technique  is  the  adoption  of  a  topological  view 
of  data  structures,  in  contrast  to  the  normal  algebraic  view.  This  approach 
was  successfully  applied  to  the  derivation  of  pattern  matching  algorithms  in 
[Srinivas  93.  Srinivas  92j.  In  this  project,  this  approach  was  extended  to  parsing 
algorithms  for  context-free  grammars.  This  extension  also  yielded  more  insight 
into  what  a  topological  view  of  data  structures  entails,  e.g..  a  connection  to  the 
object-oriented  view  of  datatypes. 


2.1  A  topological  view  of  data  structures 

VV’e  will  briefly  describe  the  characteristics  of  such  a  view.  Consider  the  example 
of  strings.  Normally,  thej-  are  described  algebraically,  as  a  monoid  on  some 
alphabet.  That  is,  strings  are  constructed  inductively  using  the  operations  of 
empty  strings,  singleton  string,  and  concatenation.  Functions  on  strings  are 
defined  using  recursion  on  tdese  con.structors. 

In  the  topological  view,  we  concentrate  on  the  internal  structure  of  strings 
(e.g.,  a  string  is  a  total  order)  and  maps  between  strings  (e.g.,  adjacency¬ 
preserving  map).  The  role  of  induction  using  constructors  is  played  by  the 
covering  relationship:  a  string  can  be  covered  by  a  set  of  (possibly  overlapping) 
substrings.  Now,  functions  are  defined  using  sheaves,  which  assign  a  value  to 
a  string  by  gluing  together  the  values  on  elements  of  a  cover  (the  analogue  of 
recursive  definitions). 

This  approach  is  usually  more  abstract  than  the  algebraic  approach,  and 
allows  us  to  concentrate  on  the  essential  aspects  of  an  algorithm. 

2.2  Application  to  parsing 

The  topology  of  data  structures  can  be  exploited  in  a  similar  manner  in  the 
derivation  of  parsing  algorithms  [Srinivas  b].  Consider  a  context-free  grammar 
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for  strings.  Such  a  grainmiir  specifies  a  set  of  strings  (the  language  generated 
by  the  grammar)  using  both  recursion  and  juxtaposition  of  strings.  The  two 
basic  operations  in  a  context-free  grammar  are 

•  rewriting  via  a  production,  i.e..  replacing  a  non-terminal  by  a  sentential 
form  (e.g.,  5  — >  aSa  applied  to  bSb  yields  haSab),  and 

•  alternation,  which  associates  several  sentential  forms  with  a  non-terminal 
(e.g.,  S  — ►  aSa  \  bSh  |  a  |  6). 

Corresponding  to  these  two  operations  we  can  associate  two  topologies  with 
a  grammar.  Replacing  a  non-terminal  involves  decomposing  a  sentential  form 
and  depends  on  the  topology  of  the  underlying  data  structure  (strings,  trees, 
etc.)  of  the  language  being  generated.  This  topology  is  specific  to  the  data 
structure  and  independent  of  the  grammar. 

.\lternation  in  the  right-hand  side  production  translates  to  a  union  of  lan¬ 
guages.  and  generates  a  topology  on  languages  [Walter  75].  Using  this  topol- 
ogjt  th-P  language  generated  by  a  gramma’'  ran  be  described  as  a  fixed  point 
[Manes  and  .Arbib  86] .  This  topology  is  specific  to  the  grammar  and  (almost) 
independent  of  the  data  structure. 

By  separating  these  two  topologies  and  precisely  characterizing  their  rela¬ 
tionship,  we  can  abstractly  describe  parsing  algorithms  and  provide  an  intuitive 
explanation  of  such  techniques  as  LR  parsing.  The  collection  of  parses  of  a 
given  terminal  string  has  the  structure  of  a  sequence  of  sheaves,  each  sheaf  rep¬ 
resenting  the  collection  of  (partial)  derivation  trees  of  depth  up  to  n  (for  n  =  1, 
2,  3,  . . .).  Different  parsing  algorithms  (top-down,  bottom-up,  etc.)  correspond 
to  different  design  decisions  and  optimizations  used  in  generating  this  sequence 
and  extracting  parses. 

A  sheaf-theoretic  characterization  of  parsing  nicely  exposes  the  interplay 
between  the  algebraic  and  topological  structure  of  a  grammar.  The  sequence 
of  parse  sheaves  above  becomes  a  simplicial  complex  if  composite  productions 
are  included  (e.g.,  if  S  — >  AB  and  .4  —*■  a4,  we  get  a  composite  production 
S  — >  clAB). 

2.3  Connecting  algebraic  and  topologictd  descriptions  of  datatypes 

There  is  a  similar  interplay  in  equationally  described  data  types.  Consider 
an  abstract  data  t)q)e  which  is  characterized  by  a  set  of  operations  satisfying 
some  axioms  (see,  e.g.,  [W'irsing  90]).  The  collection  of  terms  generated  by  the 
signature  of  the  data  type  can  be  cast  as  a  sequence  of  sheaves.  Using  this 
connection,  the  topological  structure  of  an  algebraically  specified  data  type  can 
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be  exposed.  This,  in  turn,  can  provide  some  of  the  facilities  of  obje(  t-orieiired 
programming  style  in  the  framework  of  algebraic  specification  [Srinivct*  dj.  For 
example,  we  can  systematically  convert  between  the  description  of  a  tree  using 
the  operations  empty  and  join  (an  algebraic  description)  and  one  using  a  set 
of  nodes  with  a  parent-child  relationship  satisfying  some  conditions  (an  object- 
oriented  description). 

2.4  Homomorphisms  between  datatypes 

A  topological  description  of  dataypes,  although  conceptually  simple,  needs  a 
lot  of  mathematical  machinery.  On  the  other  hand,  the  algebraic  style  is  sim¬ 
ple  and  has  been  successfully  used  in  the  automated  support  of  the  program 
synthesis  [Smith  90].  One  result  of  this  project  is  the  insight  that  some  of  the 
benefits  of  the  topological  view  can  be  obtained  by  adding  a  facility  for  defining 
homomorphisms  to  algebraic  specifications.  Homormorphisms  relate  the  inter¬ 
nal  structure  of  two  datatypes.  For  example,  if  a  tree  is  described  as  a  set 
of  nodes  together  with  a  parent  relationship,  then  it  can  be  homomorphically 
mapped  into  a  set  of  nested  regions  on  a  display;  this  technique  is  used  in  the 
specification  of  a  graphical  user  interface  in  [Srinivas  and  Jiillig  93].  Note  that 
the  notion  of  homomorphism  used  here  is  somewhat  different  from  that  used  in 
[Bird  87];  the  latter  operates  at  the  level  of  constructors. 

2.5  Sheaves  and  constraint  satisfaction 

The  connection  between  sheaves  and  constraint  satisfaction  was  noticed  in 
[Srinivas  93]  using  Waltz’s  filtering  algorithm  for  scene  labeling.  I  explored 
this  connection  further  in  the  context  of  type  inference  algorithms. 

A  constraint  satisfaction  problem  consists  of  a  hypergraph  in  which  the  nodes 
represent  are  variables  and  the  (hyper)  arcs  represent  constraints  [Montanari  and  Rossi  91]. 
Each  node/ variable  is  labeled  by  a  set  of  admissible  values,  and  each  arc  by  a 
relation  which  specifies  a  constraint.  A  solution  to  such  a  problem  is  an  assign¬ 
ment  of  values  to  variables  which  satisfies  all  the  constraints. 

Consider  the  site  of  all  subgraphs  of  a  hypergraph  with  inclusions  as  arrows 
and  surjective  families  as  covers.  Then  the  solutions  to  a  constraint  satisfac¬ 
tion  problem  form  a  sheaf:  any  consistent  assignment  must  be  assembled  from 
consistent  parts.  Constraint  satisfaction  algorithms  search  for  consistent  as¬ 
signments  of  values  to  variables,  a  process  which  can  be  succinctly  described  as 
sheafification  starting  from  a  functor  which  only  labels  each  variable  with  a  set 
of  admissible  values  and  each  arc  with  a  relation. 

For  the  case  of  type  inference,  we  treat  an  expression  tree  as  a  hypergraph: 
each  node  in  the  tree  corresponds  to  a  hypergraph  node,  and  each  function 
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corresponds  to  a  hyperarc  connecting  the  arguments  and  the  result.  Each  node 
is  labeled  by  its  possible  types,  and  each  hyperaic  by  the  possible  types  of 
the  function.  Type  inference  consists  of  finding  compatible  families  of  type 
assignments.  If  polymorphism  is  included,  the  set  of  possible  types  for  a  node 
may  be  infinite;  thus  it  has  to  be  represented  intensionally  as  an  expression. 
In  this  case,  finding  a  compatible  family  corresponds  to  finding  a  limit,  which 
in  turn  requires  equalizers.  Equalizers  in  this  context  are  nothing  but  most 
general  unifiers  [Rydeheard  and  Burstall  38].  This  example  illustrates  the  nice 
interplay  between  the  topology  of  the  expression  tree  and  the  algebra  of  the 
type  expressions,  a  situation  ideal  for  sheaves. 
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3  Publications,  Presentations,  Reports 

Publications; 

1.  Srinivas,  Y.  V.  Derivation  of  a  parallel  matching  algorithm.  In  Sec¬ 
ond  International  Conference  on  Mathematics  of  Program  Construction 
(Oxford,  UK,  July  1992),  Lecture  Notes  in  Computer  Science,  Vol.  669. 
Springer- Verlag,  1993,  pp.  323-343. 

Abstract:  We  present  a  derivation  of  a  parallel  version  of  the  Knuth-Morris- 
Pratt  algorithm  for  finding  occurrences  of  a  pattern  string  in  a  target  string. 
We  show  that  the  failure  function,  the  source  of  efficiency  of  the  sequential 
algorithm,  is  a  form  of  search  in  an  ordered  domain.  This  view  enables  the  gen¬ 
eralization  of  the  algorithm  both  beyond  sequential  execution  and  the  string 
data  structure.  Our  derivation  systematically  uses  a  divide-and-conquer  strat¬ 
egy.  The  computation  tree  so  generated  can  be  mapped  onto  time,  yielding  a 
naive  sequential  algorithm,  onto  a  processor  tree,  yielding  a  parallel  algorithm, 
or  onto  a  data  structure,  yielding  the  failure  function. 

2.  Srinivas,  Y.  V.  A  sheaf- theoretic  approach  to  pattern  matching  and 
related  problems.  Theoretical  Comput.  Sci.  112  (1993),  53-97. 

Abstract:  We  present  a  general  theory  of  pattern  matching  by  adopting  an  ex- 
tensional.  geometric  view  of  patterns.  Representing  the  geometry  of  the  pattern 
via  a  Grothendieck  topology,  the  extension  of  the  matching  relation  for  a  con¬ 
stant  target  and  varying  pattern  forms  a  sheaf.  We  derive  a  generalized  version 
of  the  Knuth-Morris- Pratt  string  matching  algorithm  by  gradually  converting 
this  extensional  description  into  an  intensional  description,  i.e.,  an  algorithm. 
The  generality  of  this  approach  is  illustrated  by  briefly  considering  other  ap¬ 
plications:  Earley’s  algorithm  for  parsing.  Waltz  filtering  for  scene  analysis, 
matching  modulo  commutativity,  and  the  n-queens  problem. 

3.  Jullig,  R.,  and  Srinivas,  Y.  V.  Diagrams  for  software  synthesis.  In  Pro¬ 
ceedings  of  the  8th  Knowledge-Based  Software  Engineering  Conference 


^Chicago,  IL,  September  20-23,  1993),  IEEE  Computer  Society  Press. 
1993,  pp.  10-19. 

Abstract:  We  describe  the  formal  environment  at  Kestrel  for  synthesizing 
programs.  We  show  that  straightforward  formalization,  persistently  applied 
at  all  levels  of  system  description  and  system  derivation,  produces  a  scalable 
architecture  for  a  synthesis  environment.  The  primitive  building  blocks  of  our 
framework  are  specifications,  which  encapsulate  types  and  operations,  and  spec¬ 
ification  arrows,  which  are  relations  between  specifications.  The  design  of  a  sys¬ 
tem  is  represented  as  a  diagram  of  specifications  and  arrows.  Synthesis  steps 
manipulate  such  diagrams,  for  example,  by  adding  design  detail  to  some  specifi¬ 
cation.  or  by  building  new  tliagrams.  A  design  history  is  a  diagram  of  diagrams. 
Thus,  we  have  a  formal,  knowledge- based,  and  machine-supported  counterpart 
to  such  software  engineering  methodologies  as  CASE  and  OOP. 

Presentations: 

1.  Homomorphisms  Between  .\lgebraic  Specifications.  Presentation  given  at 
the  IFIP  WG2.1  Meeting,  Winnipeg,  Canada,  May  1993. 

Abstract:  We  introduce  the  specification-level  counterparts  to  homomorphisms 
between  algebras.  These  take  the  form  of  homomorphism  specifications  (hom- 
specs).  which  specify  a  class  of  homomorphisms.  Homspecs  augment  the  al¬ 
gebraic  specification  framework  by  providing  a  way  of  encapsulating  families 
of  related  functions.  We  briefly  indicate  several  applicatins  where  homomor¬ 
phisms  are  useful;  (1)  inductive  function  definitions,  e.g..  size  of  a  sequence; 
(2)  specification  of  divide- and-conquer  algorithms,  e.g..  mergesort;  (3)  relat¬ 
ing  structured  objects,  e.g.,  subtrees:  (4)  abstraction/representation  functions 
which  implement  one  specification  in  terms  of  another,  e.g.,  set-as-list;  (5)  spec¬ 
ifying  constraints  in  a  graphical  display. 

Reports: 

1.  Sriniv.\3,  Y.  .\ND  JULMC,  R.  Formal  Specification  of  a  Graphical 
User  Interface,  Technical  Report  KES.U.93.4,  Kestrel  Institute.  Palo  .A.ito, 
May  1993. 

Abstract:  Formal  development  technology  has  advanced  over  the  past  two 
decades  from  the  derivation  of  small,  functional  programs  to  the  stage  where 
it  can  be  applied  to  realistic  problems.  We  specify  an  interactive  user  interlace 
for  displaying  internally  stored  data  structures.  We  show  how  the  code  for 
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propagating  the  changes  to  the  internal  representation  or  the  ili>play  can  be 
formally  derived. 
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